PageRank on Wikipedia: Towards General Importance Scores for Entities
نویسندگان
چکیده
Link analysis methods are used to estimate importance in graph-structured data. In that realm, the PageRank algorithm has been used to analyze directed graphs, in particular the link structure of the Web. Recent developments in information retrieval focus on entities and their relations (i. e. knowledge graph panels). Many entities are documented in the popular knowledge base Wikipedia. The cross-references within Wikipedia exhibit a directed graph structure that is suitable for computing PageRank scores as importance indicators for entities. In this work, we present different PageRank-based analyses on the link graph of Wikipedia and according experiments. We focus on the question whether some links based on their position in the article text can be deemed more important than others. In our variants, we change the probabilistic impact of links in accordance to their position on the page and measure the effects on the output of the PageRank algorithm. We compare the resulting rankings and those of existing systems with pageview-based rankings and provide statistics on the pairwise computed Spearman and Kendall rank correlations.
منابع مشابه
Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کاملA Dynamical System for PageRank with Time-Dependent Teleportation
We propose a dynamical system that captures changes to the network centrality of nodes as external interest in those nodes vary. We derive this system by adding timedependent teleportation to the PageRank score. The result is not a single set of importance scores, but rather a time-dependent set. These can be converted into ranked lists in a variety of ways, for instance, by taking the largest ...
متن کاملA Multilingual Entity Linker Using PageRank and Semantic Graphs
This paper describes HERD, a multilingual named entity recognizer and linker. HERD is based on the links in Wikipedia to resolve mappings between the entities and their different names, and Wikidata as a language-agnostic reference of entity identifiers. HERD extracts the mentions from text using a string matching engine and links them to entities with a combination of rules, PageRank, and feat...
متن کاملUsing Anchor Text, Spam Filtering and Wikipedia for Web Search and Entity Ranking
In this paper, we document our efforts in participating to the TREC 2010 Entity Ranking and Web Tracks. We had multiple aims: For the Web Track we wanted to compare the effectiveness of anchor text of the category A and B collections and the impact of global document quality measures such as PageRank and spam scores. For the Entity Ranking Track, we use Wikipedia as a pivot to find relevant ent...
متن کاملScientific citations in Wikipedia
The Internet-based encyclopædia Wikipedia has grown to become one of the most visited web-sites on the Internet. However, critics have questioned the quality of entries, and an empirical study has shown Wikipedia to contain errors in a 2005 sample of science entries. Biased coverage and lack of sources are among the “Wikipedia risks”. The present work describes a simple assessment of these aspe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016